Compositional segmentation and long-range fractal correlations in DNA sequences.

نویسندگان

  • Bernaola-Galván
  • Román-Roldán
  • Oliver
چکیده

A segmentation algorithm based on the Jensen-Shannon entropic divergence is used to decompose longrange correlated DNA sequences into statistically significant, compositionally homogeneous patches. By adequately setting the significance level for segmenting the sequence, the underlying power-law distribution of patch lengths can be revealed. Some of the identified DNA domains were uncorrelated, but most of them continued to display long-range correlations even after several steps of recursive segmentation, thus indicating a complex multi-length-scaled structure for the sequence. On the other hand, by separately shuffling each segment, or by randomly rearranging the order in which the different segments occur in the sequence, shuffled sequences preserving the original statistical distribution of patch lengths were generated. Both types of random sequences displayed the same correlation scaling exponents as the original DNA sequence, thus demonstrating that neither the internal structure of patches nor the order in which these are arranged in the sequence is critical; therefore, long-range correlations in nucleotide sequences seem to rely only on the power-law distribution of patch lengths. @S1063-651X~96!05905-3#

منابع مشابه

Compositional complexity of DNA sequence models

Recently, we proposed a new measure of complexity for symbolic sequences (Sequence Compositional Complexity, SCC) based on the entropic segmentation of a sequence into compositionally homogeneous domains. Such segmentation is carried out by means of a conceptually simple, computationally efficient heuristic algorithm. SCC is now applied to the sequences generated by several stochastic models wh...

متن کامل

Biological origins of long-range correlations and compositional variations in DNA.

The occurrence of certain long-range correlations between nucleotides in DNA sequences of living organisms has recently been reported. The biological origin of these correlations was unknown. The correlations were proposed to be concerned with fractal structure and differences between intron-containing and intron-less sequences. We and others have reported that no consistent difference exists b...

متن کامل

SEGMENT: identifying compositional domains in DNA sequences

MOTIVATION DNA sequences are formed by patches or domains of different nucleotide composition. In a few simple sequences, domains can simply be identified by eye; however, most DNA sequences show a complex compositional heterogeneity (fractal structure), which cannot be properly detected by current methods. Recently, a computationally efficient segmentation method to analyse such nonstationary ...

متن کامل

Mutual information for examining correlations in DNA

This paper examines two methods for finding whether long-range correlations exist in DNA: a fractal measure and a mutual information technique. We evaluate the performance and implications of these methods in detail. In particular we explore their use comparing DNA sequences from a variety of sources. Using software for performing in silico mutations, we also consider evolutionary events leadin...

متن کامل

Fractal landscapes in biological systems: long-range correlations in DNA and interbeat heart intervals.

Here we discuss recent advances in applying ideas of fractals and disordered systems to two topics of biological interest, both topics having common the appearance of scale-free phenomena, i.e., correlations that have no characteristic length scale, typically exhibited by physical systems near a critical point and dynamical systems far from equilibrium. (i) DNA nucleotide sequences have tradit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:
  • Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics

دوره 53 5  شماره 

صفحات  -

تاریخ انتشار 1996